4 research outputs found

    Automated Knowledge Extraction from IS Research Articles Combining Sentence Classification and Ontological Annotation

    Get PDF
    Manually analyzing large collections of research articles is a time- and resource-intensive activity, making it difficult to stay on top of the latest research findings. Limitations of automated solutions lie in limited domain knowledge and not being able to attribute extracted key terms to a focal article, related work, or background information. We aim to address this challenge by (1) developing a framework for classifying sentences in scientific publications, (2) performing several experiments comparing state-of-the-art sentence transformer algorithms with a novel few-shot learning technique and (3) automatically analyzing a corpus of articles and evaluating automated knowledge extraction capabilities. We tested our approach for combining sentence classification with ontological annotations on a manually created dataset of 1,000 sentences from Information Systems (IS) articles. The results indicate a high degree of accuracy underlining the potential for novel approaches in analyzing scientific publication

    Extracting Causal Claims from Information Systems Papers with Natural Language Processing for Theory Ontology Learning

    Get PDF
    The number of scientific papers published each year is growing exponentially. How can computational tools support scientists to better understand and process this data? This paper presents a software-prototype that automatically extracts causes, effects, signs, moderators, mediators, conditions, and interaction signs from propositions and hypotheses of full-text scientific papers. This prototype uses natural language processing methods and a set of linguistic rules for causal information extraction. The prototype is evaluated on a manually annotated corpus of 270 Information Systems papers containing 723 hypotheses and propositions from the AIS basket of eight. F1-results for the detection and extraction of different causal variables range between 0.71 and 0.90. The presented automatic causal theory extraction allows for the analysis of scientific papers based on a theory ontology and therefore contributes to the creation and comparison of inter-nomological networks

    Hey Article, What Are You About? Question Answering for Information Systems Articles through Transformer Models for Long Sequences

    Get PDF
    Question Answering (QA) systems can significantly reduce manual effort of searching for relevant information. However, challenges arise from a lack of domain-specificity and the fact that QA systems usually retrieve answers from short text passages instead of long scientific articles. We aim to address these challenges by (1) exploring the use of transformer models for long sequence processing, (2) performing domain adaptation for the Information Systems (IS) discipline and (3) developing novel techniques by performing domain adaptation in multiple training phases. Our models were pre-trained on a corpus of 2 million sentences retrieved from 3,463 articles from the Senior Scholars' Basket and fine-tuned on SQuAD and a manually created set of 500 QA pairs from the IS field. In six experiments, we tested two transfer learning techniques for fine-tuning (TANDA and FANDO). The results show that fine-tuning with task-specific domain knowledge considerably increases the models' F1- and Exact Match-scores

    How Best to Hunt a Mammoth - Toward Automated Knowledge Extraction From Graphical Research Models

    Get PDF
    In the Information Systems (IS) discipline, central contributions of research projects are often represented in graphical research models, clearly illustrating constructs and their relationships. Although thousands of such representations exist, methods for extracting this source of knowledge are still in an early stage. We present a method for (1) extracting graphical research models from articles, (2) generating synthetic training data for (3) performing object detection with a neural network, and (4) a graph reconstruction algorithm to (5) storing results into a designated research model format. We trained YOLOv7 on 20,000 generated diagrams and evaluated its performance on 100 manually reconstructed diagrams from the Senior Scholars\u27 Basket. The results for extracting graphical research models show a F1-score of 0.82 for nodes, 0.72 for links, and an accuracy of 0.72 for labels, indicating the applicability for supporting the population of knowledge repositories contributing to knowledge synthesi
    corecore